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Low load latency through sum-addressed mennory (SAM) 
William L. Lynch, Gary Lauterbach, Joseph I. Chamdani 

April 1998 ACM SIGARCH Computer Architecture News , Proceedings of the 25th 

annual international symposium on Computer architecture, volume 26 issue 3 
Full text available: pdf(940.38 KB) Additional Information: full citation , abstract , references, citings , index 
I p Publisher Site ^^nni 

Load latency contributes significantly to execution time. Because most cache accesses hit, 
cache-hit latency becomes an important component of expected load latency. Most modern 
microprocessors have base+offset addressing loads; thus effective cache-hit latency 
Includes an addition as well as the RAM access.Thls paper introduces a new technique used 
in the UltraSPARC III microprocessor, Sum-Addressed Memory (SAM), which performs true 
addition using the decoder of the RAM array, with very low lat ... 



2 Equivalence verification: Automated equivalence checking of switch level circuits 
Simon Jolly, Atanas Parashkevov, Tim McDougall 

June 2002 Proceedings of the 39th conference on Design automation 

Full text available: ^pdf(220.14 KB) Additional Information: full citation , abstract , references , index terms 

A chip that is required to meet strict operating criteria In terms of speed, power, or area is 
commonly custom designed at the switch level. Traditional techniques for verifying these 
designs, based on simulation, are expensive in terms of resources and cannot completely 
guarantee correct operation. Formal verification methods, on the other hand, provide for a 
complete proof of correctness, and require less effort to setup. This paper presents 
Motorola's Switch Level Verification (SLV) tool, whi ... 

Keywords: MOS circuits, VLSI design, custom design, equivalence checking, formal 
verification, switch level analysis 



3 System architectures for computer music 
John W. Gordon 

June 1985 ACM Computing Surveys (CSUR), Volume 17 issue 2 

Full text available: pdf(4.61 MB) Additional Information: full citation, abstract, references, citings, index 
*^ terms , review 

Computer music is a relatively new field. While a large proportion of the public is aware of 
computer music in one form or another, there seems to be a need for a better 
understanding of its capabilities and limitations in terms of synthesis, performance, and 
recording hardware. This article addresses that need by surveying and discussing the 
architecture of existing computer music systems. System requirements vary according to 
what the system will be used for. Common uses for co ... 
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^ Graphics rendering architecture for a high performance desktop workstation 
Chandlee B. Harrell, Farhad Fouladi 

September 1993 Proceedings of the 20th annual conference on Computer graphics and 
interactive techniques 

Full text available: ^ pdf(346.15 KB) Additional Information: full citation , references , citings , index terms 



5 A performance analysis of PIM. stream processing, and tiled processing on memory- 
intensive signal processing kernels 

Jinwoo Suh, Eun-Gyu Kim, Stephen P. Crago, Lakshmi Srinivasan, Matthew C. French 
May 2003 ACM SIGARCH Computer Architecture News , Proceedings of the 30th 

annual international symposium on Computer architecture, volume 31 issue 2 
Full text available: ^ pdf(239.50 KB) Additional Information: full citation , abstract , references 

Trends in nnicroprocessors of increasing die size and clock speed and decreasing feature 
sizes have fueled rapidly increasing performance. However, the limited improvements in 
DRAM latency and bandwidth and diminishing returns of increasing superscalar ILP and 
cache sizes have led to the proposal of new microprocessor architectures that implement 
processor-in- memory, stream processing, and tiled processing. Each architecture is 
typically evaluated separately and compared to a baseline architectu ... 

6 System-level power optimization: techniques and tools 
Luca Benini, Giovanni de Micheli 

April 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 5 Issue 2 

Full text available: eDdf(385.22 KB) Additional Information: full citation , abstract, references , citings, index 
^^^^''^ terms 

This tutorial surveys design methods for energy-efficient system-level design. We consider 
electronic sytems consisting of a hardware platform and software layers. We consider the 
three major constituents of hardware that consume energy, namely computation, 
communication, and storage units, and we review methods of reducing their energy 
consumption. We also study models for analyzing the energy cost of software, and methods 
for energy-efficient software design and compilation. This survery ... 



7 Data and memory optimization techniques for embedded systems 

p. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. 
Vandercappelle, P. G. Kjeldsberg 

April 2001 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 6 Issue 2 

Full text available: ■g Ddf(339.91 KB) Additional Information: full^crtation . abstract, references, citings, index 

We present a survey of the state-of-the-art techniques used in performing data and 
memory-related optimizations in embedded systems. The optimizations are targeted directly 
or indirectly at the memory subsystem, and impact one or more out of three important cost 
metrics: area, performance, and power dissipation of the resulting implementation. We first 
examine architecture-independent optimizations In the form of code transoformations. We 
next cover a broad spectrum of optimizati ... 

Keywords: DRAM, SRAM, address generation, allocation, architecture exploration, code 
transformation, data cache, data optimization, high-level synthesis, memory architecture 
customization, memory power dissipation, register file, size estimation, survey 



8 OMP: a RISC-based multiprocessor using orthogonal-access memories and multiple Q 
spanning buses 

K. Hwang, M. Dubois, D. K. Panda, S. Rao, S. Shang, A. Uresin, W. Mao, H. Nair, M. Lytwyn, F. 
Hsieh, J. Liu, S. Mehrotra, C. M. Cheng 
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June 1990 ACM SIGARCH Computer Architecture News , Proceedings of the 4th 

international conference on Supercomputing, Volume 18 issue 3 
Full text available: mmiMUm Information: full citation , abstract, references , citinas. index 

terms 

This paper presents the architectural design and RISC based implementation of a prototype 
supercomputer, namely the Orthogonal Multiprocessor (OMP). The OMP system is 
constructed with 16 Intel 1860 RISC microprocessors and 256 parallel memory modules, 
which are 2-D interleaved and orthogonally accessed using custom-designed spanning 
buses. The architectural design has been validated by a CSIM-based multiprocessor 
simulator. The design choices are based on worst-case delay a ... 

9 Computational models: BLOB computing 

Frederic Gruau, Yves Lhuillier, Philippe Reitz, Olivier Temam 

April 2004 Proceedings of the first conference on computing frontiers on Computing 
frontiers 

Full text available: ^pdf(1.02 MB) Additional Information: full citation , abstract , references , index terms 

Current processor and multiprocessor architectures are almost all based on the Von 
Neumann paradigm. Based on this paradigm, one can build a general-purpose computer 
using very few transistors, e.g., 2250 transistors in the first Intel 4004 microprocessor. In 
other terms, the notion that on-chip space is a scarce resource is at the root of this 
paradigm which trades on-chip space for program execution time. Today, technology 
considerably relaxed this space constraint. Still, few research works q ... 

Keywords: bio-inspiration, cellular automata, scalable architectures 



''O Inverse polarity techniques for high-speed/iow-power multipliers 
Pascal C. H. Meier, Rob A. Rutenbar, L. Richard Carley 

August 1999 Proceedings of the 1999 international symposium on Low power 
electronics and design 

Full text available: ^ pdf(341 .38 KB) Additional Information: full citation , references , index terms 



Keywords: inverse polarity, low power, multiplier 



^1 Performance comparison of IIP machines with cycle time evaluation Q 
Tetsuya Hara, Hidekl Ando, Chikako Nakanlshi, Masao Nakaya 

May 1996 ACI^ SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture, Volume 24 issue 2 
Full text available- ^ odfd 48 MB) Additional Information: full citation , abstract , references, citings, index 

Many studies have investigated performance improvement through exploiting instruction- 
level parallelism (ILP) with a particular architecture. Unfortunately, these studies indicate 
performance improvement using the number of cycles that are required to execute a 
program, but do not quantitatively estimate the penalty imposed on the cycle time from the 
architecture. Since the performance of a microprocessor must be measured by its execution 
time, a cycle time evaluation Is required as well as a cy ... 

12 IS '97: model curriculum and guidelines for undergraduate degree programs in Q 
information systems 

Gordon B. Davis, John T. Gorgone, J. Daniel Couger, David L Feinstein, Herbert E. 
Longenecker 

December 1996 ACM SIGMIS Database , Guidelines for undergraduate degree programs 
on Model curriculum and guidelines for undergraduate degree 
programs in information systems, Volume 28 issue i 

Full text available: ^ pdf(7.24 MB) Additional Infonnation: full citation , citinas 
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13 Parallel logic simulation of VLSI systems Q 
Mary L Bailey, Jack V. Briner, Roger D. Chamberlain 
September 1994 ACM Computing Surveys (CSUR), Volume 26 issue 3 

Full text available- ffi Dd«3 74 MB) Additional Information: full citation, abstract, references, citings, index 
^ ' terms 

Fast, efficient logic simulators are an essential tool in modern VLSI system design. Logic 
simulation is used extensively for design verification prior to fabrication, and as VLSI 
systems grow in size, the execution time required by simulation is becoming more and more 
significant. Faster logic simulators will have an appreciable economic impact, speeding time 
to market while ensuring more thorough system design testing. One approach to this 
problem is to utilize parallel processing, taking ... 

Keywords: circuit structure, parallel architecture, parallelism, partitioning, synchronization 
algorithm, tinriing granularity 



From VHDL to efficient and first-time-right designs: a formal approach 
Peter F. A. Middelhoek, Sreeranga P. Rajan 

April 1996 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 1 Issue 2 

Full text available* fSi pdf(722.99 KB) Additional information: full citation , abstract, references , citings , index 



terms 

' In this article we provide a practical transformational approach to the synthesis of correct 
synchronous digital hardware designs from high-level specifications. We do this while taking 
Into account the complete life cycle of a design from early prototype to full custom 
implementation. Besides time-to-market, both flexibility with respect to target architecture 
and efficiency issues are addressed by the methodology. The utilization of user-selected 
behavior-preserving transformation steps e ... 

Keywords: CDFG, SFG, VHDL, correctness by construction, design methodology, rapid 
system prototyping, transformational design 

Modeling layout tools to derive forward estimates of area and delay at the RTL level ^ 
Donald S. Gelosh, Dorothy E. Steliff 

July 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 5 Issue 3 

Full text available: ^ pdft278.32 KB) Additional Information: full citation , abstract, references , index terms 

Forward estimates of area and delay facilitate effective decision-making when searching the 
solution space of digital designs. Current estimation techniques focus on modeling the layout 
result and fail to deliver timely or accurate estimates. This paper presents a novel approach 
to deriving these area and delay estimates at the RTL level by modeling the layout tool, 
rather than the layout result. This approach uses machine learning techniques to capture the 
relationships between general des ... 

Keywords: VLSI CAD, estimation, estimation techniques, layout, machine learning 



16 Reconfigurable computing: architectures and applications: A reconfigurable unit for a Q 
clustered programmable-reconfigurable processor 

Richard B. Kujoth, Chi-Wel Wang, Derek B. Gottlieb, Jeffrey J. Cook, Nicholas P. Carter 
February 2004 Proceedings of the 2004 ACM/SIGDA 12th international symposium on 
Field programmable gate arrays 

Full text available: ^pdf(1.37 MB) Additional Information: full citation , abstract , references , index terms 

In a clustered programmable-reconfigurable processor, multiple programmable processors 
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and blocks of reconfigurable logic communicate through a register-based communication 
mechanism, which reduces the impact of wire delay on clock cycle time. In this paper, we 
present a circuit-level design for the reconfigurable clusters used on the Amalgam 
programmable-reconfigurable processor. We outline our interleaved reconfigurable array 
design, which provides high bandwidth to and from the register file ... 

Keywords: FPGA, reconfigurable processor, technology scaling 



17 Cellular and Cryptographic Applications: Application of FPGA technology to accelerate Q 
the finite-difference time-domain (FDTD) method 
Ryan N. Schneider, Laurence E. Turner, Michal M. Okoniewski 

February 2002 Proceedings of the 2002 ACM/SIGDA tenth international symposium on 
Field-programmable gate arrays 

Full text available: ^ pdf(463.90 KB) Additional Information: full citation , abstract , references , citings 

The continuing advances in the field of electrical engineering, in areas like cellular 
connmunications, fiber optics, mobile and multi-gigahertz electronics have necessitated a 
computer-assisted design approach to the complex electromagnetic interactions and 
problems that arise. Finite-Difference Time-Domain (FDTD) Analysis is a very powerful tool 
for the modeling of electromagnetic phenomena. The algorithm is computationally intensive 
and simulations can run for a few hours to several days. Incr ... 

^8 Using general-purpose programming languages for FPGA design Q 
Brad L. Hutchings, Brent E. Nelson 

June 2000 Proceedings of the 37th conference on Design automation 

Full text available- ■ mDdf(287.38 KB^ Additional Information: full citation , abstract, references , dtiogs. index 
^^^^^ terms 

General-purpose programnning languages (GPL) are effective vehicles for FPGA design 
because they are easy to use, extensible, widely available, and can be used to describe both 
the hardware and software aspects of a design. The strengths of the GPL approach to circuit 
design have been demonstrated by JHDL, a Java-based circuit design environment used to 
develop several large FPGA-based applications at several Institutions. Major strengths of the 
JHDL environment include a common run-time for ... 

^9 Measurement and evaluation of the MIPS architecture and processor Q 
Thomas R. Gross, John L. Hennessy, Steven A. Przybylski, Christopher Rowen 
August 1988 ACM Transactions on Computer Systems (TOCS), Volume 6 issue 3 

Full text available- f S pdf(2.30 MB) Additional Information: full citation , abstract , references , citings , index 
• terms , review 

MIPS is a 32-bit processor architecture that has been implemented as an nMOS VLSI chip. 
The instruction set architecture is RISC-based. Close coupling with compilers and efficient 
use of the instruction set by compiled programs were goals of the architecture. The MIPS 
architecture requires that the software implement some constraints in the design that are 
normally considered part of the hardware implementation. This paper presents experimental 
results on the effectiveness of this processor ... 

An object-oriented cell library manager Q 
Naresh K. Sehgal, C. Y. Roger Chen, John M. Acken 

November 1994 Proceedings of the 1994 IEEE/ACM international conference on 
Computer-aided design 

Full text available: ^ Ddf(469.77 KB) Additional Information: full citation , references , index terms 
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September 1994 ACM Computing Surveys (CSUR), volume 26 issue 3 

Additional Information: full citation , abstract , references, citings, index 
terms 



Full text available: ^d6U3JA MB) 



Fast, efficient logic simulators are an essential tool in modern VLSI system design. Logic 
simulation is used extensively for design verification prior to fabrication, and as VLSI 
systems grow in size, the execution time required by simulation is becoming more and more 
significant. Faster logic simulators will have an appreciable economic impact, speeding time 
to market while ensuring more thorough system design testing. One approach to this 
problem Is to utilize parallel processing, taking ... 

Keywords: circuit structure, parallel architecture, parallelism, partitioning, synchronization 
algorithm, timing granularity 



2 Inverse polarity techniques for high-speed/low-power multipliers 
Pascal C. H. Meier, Rob A. Rutenbar, L, Richard Carley 

August 1999 Proceedings of the 1999 international symposium on Low power 
electronics and design 

Full text available: ^ pdf(341 .38 KB) Additional Information: full citation , references, index terms 



Keywords: inverse polarity, low power, multiplier 



3 Equivalence verification: Automated equivalence checking of switch level circuits 
Simon Jolly, Atanas Parashkevov, Tim McDougall 

June 2002 Proceedings of the 39th conference on Design automation 

Full text available: ^pdf(220.14 KB) Additional Information: full citation , abstract, references , index terms 

A chip that is required to meet strict operating criteria in terms of speed, power, or area Is 
commonly custom designed at the switch level. Traditional techniques for verifying these 
designs, based on simulation, are expensive in terms of resources and cannot completely 
guarantee correct operation. Formal verification methods, on the other hand, provide for a 
complete proof of correctness, and require less effort to setup. This paper presents 
Motorola's Switch Level Verification (SLV) tool, whi ... 

Keywords: MOS circuits, VLSI design, custom design, equivalence checking, formal 
verification, switch level analysis 
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4 Symbolic functional and timing verification of transistor-level circuits Q 
Clayton B. McDonald, Randal E. Bryant 

November 1999 Proceedings of the 1999 IEEE/ACM international conference on 
Computer-aided design 

Full text available: g|Ddfn01.06 KB^ Additional Information: full citation , abstract, references , citings, index 
^ terms 

We introduce a new method of verifying the timing of custom CMOS circuits. Due to the 
exponential number of patterns required, traditional simulation methods are unable to 
exhaustively verify a medium-sized modern logic block. Static analysis can handle much 
larger circuits but Is not robust with respect to variations from standard circuit structures. 
Our approach applies symbolic simulation to analyze a circuit over all input combinations 
without these limitations. We present a prototype s ... 

5 Reconfigurabie computing: architectures and applications: A reconfigurable unit for a Q 
clustered programmable-reconfigurable processor 

Richard B. Kujoth, Chi-Wei Wang, Derek B. Gottlieb, Jeffrey J. Cook, Nicholas P. Carter 
February 2004 Proceedings of tlie 2004 ACI^/SIGDA 12th international symposium on 
Field programmable gate arrays 

Full text available: ^Ddfn.37 MB) Additional Information: full citation , abstract , references , index terms 

In a clustered programmable-reconfigurable processor, multiple programmable processors 
and blocks of reconfigurable logic communicate through a register-based communication 
mechanism, which reduces the impact of wire delay on clock cycle time. In this paper, we 
present a circuit-level design for the reconfigurable clusters used on the Amalgam 
programmable-reconfigurable processor. We outline our interleaved reconfigurable array 
design, which provides high bandwidth to and from the register file ... 

Keywords: FPGA, reconfigurable processor, technology scaling 



6 An object-oriented cell library manager 

Naresh K. Sehgal, C. Y. Roger Chen, John M. Acken 

November 1994 Proceedings of tHe 1994 IEEE/ACM international conference on 
Computer-aided design 

Full text available: ^ pdf(469.77 KB) Additional Information: full citation, references , index terms 



7 Session 8C: advances in layout and synthesis: Layout-driven area-constrained timing 
optimization by net buffering 

Rajeev Murgal 

November 2000 Proceedings of the 2000 IEEE/ACM international conference on 
Computer-aided design 

Full text available: ^ pdf(252.73 KB) Additional Information: full citation , abstract , references 

With the advent of deep sub-micron technologies, interconnect loads and delays are 
becoming significant, and layout-driven synthesis has become the need of the day. 
However, given the tight constraints imposed by the layout (e.g., area availability, 
congestion), only those synthesis transforms can be made layout-driven that are local and 
layout-friendly. Examples of such transforms are net buffering, gate resizing, and gate 
replication. In this paper, we address the problem of minimizing the dela ... 

8 Self-test methodology for at-speed test of crosstalk in chip interconnects 
Xiaoliang Bai, Sujit Dey, Janusz RajskI 

June 2000 Proceedings of the 37th conference on Design automation 

Full text available* ISI odfMI 3 37 KB) Additional Information: full citation, abstract, references , citings , index 
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The effect of crosstalk errors is most significant in high-performance circuits, mandating at- 
speed testing for crosstalk defects. This paper describes a self-test methodology that we 
have developed to enable on-chip at-speed testing of crosstalk defects in System-on-Chip 
interconnects. The self-test methodology is based on the Maximal Aggressor Fault Model 
[13], that enables testing of the interconnect with a linear number of test patterns. To 
enable self-testing of the interconnects, we h ... 

9 Low power and low voltage CMOS digital circuit techniques 
Christer Svensson, Atila Alvandpour 

August 1998 Proceedings of the 1998 international symposium on Low power 
electronics and design 

Full text available: ^ pdf(491.28 KB) Additional Information: full citation , abstract , references , index terms 

One of many important factors affecting power consumption is the choice of circuit 
technique for logic, latches and flip-flops. We analyze the power consumption at circuit level 
and use the results to guide the choice of circuit technique. Several types of latches and flip- 
flops are compared regarding power consumption and speed. Comparing logic clearly 
indicates that simple static logic In general have the lowest power consumption. Another 
very important factor affecting power consumption ... 

Keywords: CMOS, digital circuits, low power, low voltage 



10 Librarv-less synthesis for static CMOS combinational logic circuits Q 
S. Gavrilov, A. Glebov, S. Pullela, S. C. Moore, A. Dharchoudhury, R. Panda, G. Vijayan, D. T. 
Blaauw 

November 1997 Proceedings of the 1997 IEEE/ ACM international conference on 
Computer-aided design 

Full text available: ^ pdf(98.66 KB) ^ Additional Information: full citation , abstract, references, citings , index 

Publisher Site 

Traditional synthesis techniques optimize CMOS circuits in two phases: i) logic minimization 
and ii) library mapping phase. Typically, the structures and the sizes of the gates In the 
library are chosen to yield good synthesis results over many blocks or even for an entire 
chip. Consequently this approach precludes an optimal design of individual blocks which may 
need custom structures. The authors present a new transistor level technique that optimizes 
CMOS circuits both structurally and size-w ... 

Keywords: CMOS logic circuits, circuit performance, design space, library-less synthesis, 
optimal design, resynthesized circuits, size-wise CMOS circuit optimization, static CMOS 
combinational logic circuits, structural CMOS circuit optimization, transistor level technique 



Achieving 550 MHz in an ASIC methodology 
D. G. Chinnery, B. Nikollc, K. Keutzer 

June 2001 Proceedings of the 38th conference on Design automation 

Full text available- fi3 Ddffl 21 IVIB) Additional Information: full citation, abstract, references, citings, index 
■ ^ terms 

Typically, good automated ASIC designs may be two to five times slower than handcrafted 
custom designs. At last year's DAC this was examined and causes of the speed gap between 
custom circuits and ASICs were identified. In particular, faster custom speeds are achieved ' 
by a combination of factors: good architecture with well-balanced pipelines; compact logic 
design; timing overhead minimization; careful floorplanning, partitioning and placement; 
dynamic logic; post-layout transistor and wire ... 

Keywords: ASIC, clock, comparison, custom, frequency, speed, throughput 
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latency control 

Luca BeninI, Enrico Macii, Massimo Poncino 

June 1997 Proceedings of the 34th annual conference on Design automation - Volume 
00 

Full text available: J pdf(230.63 KB) Additional Information: full citation , abstract, references , citings, index 
W Publisher Site i^nos 

This paper presents a technique, alternative to performance-drivensynthesis, that allows to 
drastically increase the averagethroughput of combinational logic blocks by transforming 
fixed-latencyunits into variable-latency ones that run with a fasterclock cycle.The 
transformation is fully automatic and can beused in conjunction with traditional design 
techniques, such aspipelining, to improve the overall performance of speed- 
criticalsystems. Results, obtained on a large set of benchmark circuits,a ... 

13 Memory, control and communications synthesis for scheduled algorithms 
Douglas M. Grant, Peter B. Denyer 

January 1991 Proceedings of the 27th ACM/IEEE conference on Design automation 

Full text available- 1^ Ddf(805 53 KB) Additional Information: full citation , abstract , references, citings, index 
* ^ terms 

This paper explores a method of grouping Individual memory requirements from a 
hardware-constrained schedule of an algorithm, such that control and communications may 
be optimised. A new representation of memory requirements is introduced to explain the 
method. The technique may also be used to allocate operations to hardware resources. This, 
and control and communication optimisation are illustrated with an example. 

Memory binding for performance optimization of control-flow intensive behaviors 
Kamal S. Khouri,»Ganesh Lakshminarayana, Niraj K. Jha 

November 1999 Proceedings of the 1999 IEEE/ACM international conference on 
Computer-aided design 

Full text available: pdf(164.71 KB) Additional Information: full citation, abstract, references, citings, index 

terms 

This paper presents a memory binding algorithm for behaviors that are characterized by the 
presence of conditionals and deeply-nested loops that access memory extensively through 
arrays. Unlike previous works, this algorithm examines the effects of branch probabilities 
and allocation constraints. First, we demonstrate, through examples, the importance of 
incorporating branch probabilities and allocation constraint information when searching for a 
performance-efficient memory binding. We als ... 

15 A ultra fast Euclidean division algorithm for prime memory systems 
Benoit Dupont de Dinechin 

August 1991 Proceedings of the 1991 ACM/IEEE conference on Supercomputing 

Full text available: ^ pdff931.53 KB) Additional Information: full citation , references, citings, index terms 



16 Increasing the effective bandwidth of complex memory systems in multivector 
processors 

Anna M. del Corral, Jose M. Llaberia 

November 1996 Proceedings of the 1996 ACM/IEEE conference on Supercomputing 
(CDROiwi) - Volume 00 

Full text available: "^ pdf (185.79 KB) Additional Information: full citation, abstract , references, index terms 
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